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ABSTRACT 

In light of the three-year data release from WMAP we re-examine the evidence for 
the "Axis of Evil" (AOE). We discover that previous statistics are not robust with 
respect to the data-sets available and different treatments of the galactic plane. We 
identify the cause of the instability and implement an alternative "model selection" 
approach. A comparison to Gaussian isotropic simulations find the features significant 
at the 94-98% level, depending on the particular AOE model. The Bayesian evidence 
finds lower significance, ranging from "substantial" at A(ln.E') ~ 1.4, to no evidence 
for the most general AOE model. 

Key words: cosmic microwave background 



1 INTRODUCTION 

The Wilkinson Microwave Anisotropy Probe (WMAP) has 
produced spectacular high resolution all-sky observations 
of the Cosmic Microwave Background (CMB), which have 
bolstered the case for the ACDM conc ordance cosmo- 
logical model (|Spergel et al.l 120031. 120061). After t he re- 
lease of the first-year results ( Bennett et al.l 120031 ) there 
was a flurry of studies into the Gaussianity and sta- 
tistical isotropy of the data, as these are fundamental 
predictions of inflation theories. Reports of something 
awry have b een o b tained using a variety of techniques 
e .g., iParkl (|2004); lEriksen et all d2004af); lHansen et ail 




Donoghuc & Donoghu- 



(2004 



Land & Magueiio 
Eriksen et all (|2004bl )~ 



IVielva et all (|2004l )). In this paper we focus on anoma- 
lies in the largest scale modes, after it was first noted 
that the quadrupole (I = 2) and octopole (I = 3 ) ap- 
peared to be correlated (|de Oliveira-Costa et al.| [2004). and 
their power is suspiciously low. Much work has focussed 
on th e alignment and "planarity" of these two multi- 
poles (|Copi et alJl2006l: ISchwarz et aljlgooi iRalston fc Jain! 
120041 ); but in lLand fc Magueiiol (|2005bD it was seen that the 
alignment actually extends to the four multipoles i = 2 — 5, 
along the axis (6,/) ~ (60,-100). This feature has been 
dubbed the "axis of evil" (AOE). 

To be more precise the AOE expression has come to 
signify various different things. Generally it is intended to 
denote any form of statistical anisotropy, i.e., a feature in 
the CMB fluctuations which picks a preferred direction. This 
can be realized in many ways e.g., multipole planarity (the 



dominance of m = ±1 modes along the preferred axis), or a 
more general form of m-preference. In this respect it must 
be said that while everyone agrees on the presence of the 
"axis of evil" in the data, its extent is still debated. The 
expression is also sometimes associated with the low power 
in the low Is. This is quite inap propriate: while low powe r 
may be related to the AOE (see lLand fc Magueiiol ((2006)) 
there is nothing "axial" or anisotropic in a power spectrum 
anomaly per se. 

There are two possible fault lines in the analysis leading 
to the "axis of evil" effect. The first concerns the integrity 
of the data itself, i.e., contamination from noise, system- 
atics and foregrounds. Comparison between the first-year 
(WMAP1) and third-year (WMAP3) data releases shows 
that the raw data has hardly changed on large scales. How- 
ever there are several "all-sky" renditions of the data and 
these do lead to significant disparities: in this paper we show 
that this is true regarding the intensity of the AOE, so that 
discussions should emphasise not so much 1st V's 3rd year 
data, but the various treatments of the galactic plane region. 

The second fault line concerns the "meaning" of the 
detection, and by this we mean the robustness of the statis- 
tics used, and whether there is support for planarity or more 
general m-preference. The frequentist formalism provides no 
clean way to penalise for extra parameters or to weigh-up 
the detections against each other, or the null hypothesis. In- 
stead simulations are used to assess how likely it is to get 
such a feature in a Gaussian statistcally isotropic (SI) CMB 
sky, but selection effects (by which we mean the tuning of 
the statistic or model to the data) are hard to account for. 
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Here the confrontation of Bayesianism and frequentism be- 
comes a very practical matter. 

We carry out this project as follows. In Section [2] we re- 
examine the original frequentist AOE results for various ren- 
ditions of the WMAP1 and WMAP3 data, and we discuss 
further the limitations of the original frequentist method, 
such as its lack of robustness, at least with regards to m- 
preference AOE (as opposed to "planarity" ) . In Section[3]we 
follow a model comparison approach, and find that this is 
much more robust when confronted with the different data- 
sets. Further we compare the evidence for the models; pla- 
narity and more complex m-preference. In Section|3]we sum- 
marise and discuss the results. 



2 INSTABILITIES OF THE FREQUENTIST 
STATISTICS 



To ass ign an axis to each multipole, Ide Oliveira-Costa et al.l 
|2004 ) proposed the following statistic: 



qe = max 



y^m 2 |ai m (n) 



(1) 



where the a^ m s are computed in the frame with z-axis in 
direction n. This selects the frame dominated by the planar 

m = ± 1 modes. 

In lLand fc Magueiiol l|2005bh we generalized this statis- 
tic to allow for any m domination, i.e., not restricting our- 
selves to planar configurations, with the statistic: 



ri 



max 

mn 



CVm(n) 



(2) 



where C m (n) = \a m \ 2 , and CV m (n) = 2\a tm \ 2 for m > 
(notice that 2 modes contribute for m 0), for the a^ m s 
computed in the frame with z = n. This produces three 
important quantities for each multipole: the direction n^, 
the "shape" mi, and the ratio re of the multipole's power 

absorbed by the mode me in the direction m. 

We extend the work of lLand fc Magueiiol (|2005bh by 
applying this statistic to the following data-s ets: 

• The WMAP mission ((Bennett et al.ll2003h produced full 
sky CMB maps from ten differencing assemblies (DAs). 
They also produced an "internal linear combination" (ILC) 
map. This assumes no external information about the 
foregrounds and combines smoothed frequency maps with 
weights chosen to minimize the rms fluctuations, using sep- 
arate sets of weights for 12 disjoint sky regions. In the 
first-year data release the WMAP collaboration advised 
that the ILC map be used only as a visual tool. How- 
ever, for the third-year release a thorough error analysis of 
the ILC map was performed , and a bias correction imple- 
mented (|Hinshaw et aLlfeoOrjl ). The resulting third-year ILC 
map (herein WMAP3) is expected to be clean enough on 
scales £ 10 to be used without a mask. WMAP data is 
available from http://lambda.gsfc.n asa.gov. 

• Third-party maps include those of lTegmark et all (2003), 
who produced their own ILC map. Like above, an "internal" 
method is employed assuming only a black-body spectrum 
for the CMB, but now the weights depend on scale (in har- 
monic space) as well as galactic latitude. This is advanta- 
geous because different sources of contamination dominate 
at different scales - foregrounds at large scales, and noise 



at smaller scales. As well as the cleaned map, a Wiener fil- 
tered map is produced that, through a comparison with the 
WMAP best estimates of theoretical Ct, adjusts the power of 
the map so to suppress noisy fluctuations. We use their first 
(TOH1) and third-year (TOH3) cleaned-maps, all available 
from www.hep.upenn.edu/ max/wmap.html. 
• In an anlaysis of the ILC map-making 
method, lEriksen et al.l (|2004l ) proposed a faster algo- 
rithm for the computation of the weights, that employs 
Lagrangian multipliers to linearize the problem. Although 
this produces identical results to that of the WMAP 
team, and is indeed the method employed by the WMAP 
collaboration for their third-year map, the authors applied 
it to the first-year data using slightly different regions, thus 
producing a slightly different ILC map (herein LILC1), 
available at http://lambda.gsfc.nasa.gov. 

There are of course the original frequency maps, which 
require a mask. However, for the task of assessing statistical 
isotropy we require full sky information, and thus we only 
employ these ILC maps. 

In Table [1] we list the results obtained with frequen- 
tist AOE statistic © for the various data-sets. It is clear 
that this statistic is not robust - very similar maps can 
find very different results as indicated by the final col- 
umn. The expected inter-angle for isotropic axes is 1 radian 
(~ 57°), thus a mean of ~ 22° is remarkably low and a 
comp arison to simulations puts this at the 99.9% confidence 
level (|Land fc Magueiiol l2005bh . However, this result only 
holds for two of the maps, and a small fluctuation in just 
one multipole makes the ne jump elsewhere. This highlights 
one weakness of this statistic - its discontinuous nature. 

In Fig. [1] we visualize how "close calls" may arise, ex- 
plaining the discontinuities of the results in Table [1] For 
the quadrupole and octopole of the TOH1 map, we plot the 
power ratio at the position n 



Rt(n) 



Cim(n) 

E Km' I 2 



(3) 



Thus the "axis of evil" statistic ((2)1 picks out as the po- 
sition of the hottest spot from these maps (note the de- 
generacy between m = 1,2 for £ — 2 - we avoid this in 
practice by taking just the m = 2 solution). Below the Re 
maps we plot the associated m picked by Re for a given n. 
We can now diagnose the instabilities in Table [TJ by iden- 
tifying close calls in the competition for the hottest spot. 
For the quadrupole the m = and m — 2 modes, and for 
the octopole the m = 1 and m — 3 modes are fighting a 
close battle. The overall mean inter-angle (which measures 
the strength of the AOE) depends closely on this battle, 
and thus the instability of this statistic. We should stress 
that the instabilities id entified here do not seem to plague 
statistics for planarity (|Magueiio fc Sorkin|[200e3 ). 



3 MODEL COMPARISON 

The instabilities discovered appear to be cured by a model 
comparison treatment, which allows for an evaluation of the 
evidence for m-preference in £ — 2 — 5, over simple pla- 
narity for £ = 2, 3. Rather than computing a statistic from 
the maps (e.g., the mean inter-angle between the n for the 
various £), the idea is to assess the "evidence" for a model 
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i = 2 




I = 3 




e = A 




1 = 5 




Mean 


Map 


(M) 


m 


(6,0 


in 


(6,1) 


m 


(6,1) 


in 


inter-0 


LILC1 


(0.9, 156.7) 





(63.0, -126.9) 


3 


(56.7, -163.7) 


2 


(48.6, -94.7) 


3 


51.4 


TOH1 


(58.5, -102.9) 


2 


(62.1, -120.6) 


3 


(57.6, -163.3) 


2 


(48.6, -93.4) 


3 


22.4 


TOH3 


(76.5, -134.0) 


2 


(27.0, 51.9) 


1 


(35.1, -130.6) 


1 


(47.7, -94.7) 


3 


53.8 


WMAP3 


(2.7, -26.5) 





(62.1, -122.6) 


3 


(34.2, -131.2) 


1 


(47.7, -96.0) 


3 


53.7 



Table 1. The axes, in galactic coordinates (6, 1), and m that maximise Q for the multipoles ^ = 2 — 5, for various all-sky renditions 
of the first and third-year WMAP data. Note the low mean inter-angle values for the TOH1 map, which indicate a strong correlation 
between the multipoles (it i.e., AOE). The dicontinuous nature of the statistic causes the results to vary widely. 






Figure 1. The power ratio R^(a) in the dominating m mode (above), and the m value (below) for the quadrupole (left) and octopole 
(right). The "axis of evil" statistic in J2} searches for the hottest spot in these maps. We can see the close calls that cause the results to 
vary widely in Table [T] Plotted in galatic coordinates and Mollweide projection. 



encoding m-preference or planarity, compared to the base 
model of statistical isotropy. We first outline the general 
formalism. 

Let C be the likelihood of the data given a model, and 
k the number of parameters of this model. The parameters 
should be tuned so to maximize the likelihood, or equiva- 
lently, to minimize the information in the data given the 
theory (defined as I(D\T) = — ln(£)). However the real ev- 
idence should refer to the information in the data and the 
theory: I(D n T) = I(D\T) + I(T), where the information 
in the theory, I(T), provides a penalization related to the 
number of para meters. This matter is beh ind the "Occam's 
razor" rationale jMagueiio fc Sorkinl200r3 ). and the informa- 
tion criteria l|Liddlell2004l ). According to the Aikaike infor- 
maiton criteria (AIC), the information in a theory is simply 
the number of parameters, k. In fact, we will use a more ac- 
curate form, which is especially important for small sample 
size, l AIC — k + xzijz-i i wnere N is the n umber of data 
points being fit l|Burnham fc Anderson|[2006l ). 



An alternative approach to the problem of penalization 
is to compute the Bayesian evidence, 



E = j C{D\6,M) Yl{6)A6 = P(D\M), 



(4) 



where LT are the p riors on the pa rameters 9 for the model in 
question (see e.g., iTrottal (|2005l ) for a review). Bayes theo- 
rem tells us how this is related to the probability of a model 
P(M \D), and it provides an effective penalization by com- 
puting the average of the likelihood over this expanded pa- 
rameter space. As an approximation to the logarithm of the 
Bayes factor, B = Ei/Eq, we will compute the Bayesian in- 
formation criteria (BIC), I BIC = -InN (confusingly this is 
not actually related to information-theoretic methods). 

The evidence H for a theory Ti is then defined as the 
decrease in the information of data and theory when it is 
compared with a null hypothesis Tq: 



H = I(D n To) - I{D n Ti) 

= Hf - H p , 



(5) 
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Data-set 


(6 





e 


Hf 


h aic 


h bic 


ln_B 


LILC1 


63 


-120 


.042 


6.51 


2.01 


2.78 


1.36 


TOH1 


61 


-113 


.032 


7.48 


2.98 


3.75 


1.85 


TOH3 


74 


-129 


.018 


6.97 


2.47 


3.24 


1.27 


WMAP3 


64 


-123 


.043 


6.49 


1.99 


2.76 


1.32 



Table 2. The maximum likelihood improvement, Hf, and best- 
fitting parameters for the planarity model (i.e., 3 extra param- 
eters), from various all-sky renditions of the WMAP data. We 
consider the evidence from AIC and BIC methods as well as the 
Bayes factor (ln_B = AlnB). 

where Hf measures the improvement in the fit Hf — 
ln(£i) — ln(£o), an <l H p is the extra penalization we have in 

our new theory. 

In the language of the Jeffreys' scale (| Jeffreys! Il96ll ; 
iLiddle et all 120061 ) ln(-B), or H, between 1 and 2.5 signals 
substantial evidence, between 2.5 and 5 signals strong evi- 
dence, and "decisive" evidence requires ln(_B) > 5. However, 
for these rules of thumb to apply to the IC methods, various 
conditions should be met. For example the AIC assumes 
Gaussianity of the likelihood with respect to the param- 
eters, while the BIC assumes independent identically dis- 
tributed data points. We will therefore compare these results 
to those from statistically isotropic Gaussian simulations, in 
Section We will also compute the Bayes factor, for com- 
parison with the BIC approximation, and the frequentist 
results. 



3.1 Planarity model 

It was shown in iMagueiio fc Sorkinl (|2006l ) that the planarity 
of the £ — 2, 3 multipoles is supported by a Bayesian anal- 
ysis. The model used to assess the evidence for planarity is 
based on the diagonal covariance matrix: 

(\a em \ 2 )(n) = a (Si\ m \ + e(l - 5e\ m \)) (6) 

where n and e are the free parameters of the model (in ad- 
dition to ct that is common with the isotropic model, but of 
a different value), with e ^ 1. We use the same e and n for 
both multipoles, so that k = 3, N = 12, H p IC = 4.5, and 
Hp IC = 3.73. In Table [2] we list the parameter values that 
maximize the likelihood, together with Hf and H following 
the AIC and BIC methods. 

We also compute the Bayesian evidence and record 
ln(_B) in Table [2] We do this via brute force integration, 
and for the base model ((|af m | 2 ) = ce) we use a uniform 
prior on c t ; ^ c t £{£ + 1)/(2tt) < 3000 fj,K 2 . For the pla- 
narity model we use uniform priors on e 6 [0,1], and on ce; 
< a til + 1)/(2tt) sC (2£ + 1) x 3000^ 2 , with the fur- 
ther constraint c t £(£ + 1)/(2tt) sC 3000^ii\ 2 where ce is the 
average ce = (|ai m | 2 ) l{2£ + 1). 

As before (|Magueiio fc SoAm1 l2006) we find that varia- 
tions between different galactic plane treatments lead to only 
small variation in Hf. However, different evidence measures 
reach different conclusions. All the measures find at least 
substantial evidence for the planarity model, however the 
AIC and BIC appear to significantly overestimate this evi- 
dence compared to the ln(_B) result. We refer to Section T3. 31 
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vnl 


(6 





e 


Hf 


2 





6 


157 


0.027 


3.47 


2 


2 


59 


-103 


0.030 


3.09 


3 


3 


62 


-120 


0.025 


5.06 


4 


2 


58 


-163 


0.041 


5.07 


4 





43 


-98 


0.043 


4.02 


5 


3 


49 


-93 


0.026 


7.65 



Table 3. The maximum likelihood improvement, Hf, for a dom- 
inating m-mode model in the TOH1 map, where each multipolc 
can select its own axis, e, and m'. Where there is a close call, the 
runner up m! is also listed. 

for a frequentist assessment of significance, through an anal- 
ysis of Hf from simulations. 

3.2 General m-preference model 

Using the same formalism we now revisit the debate on the 
extent of the AOE, i.e., m-preference as opposed to pla- 
narity. In the Bayesian formalism the matter can be ad- 
dressed by replacing the the covariance matrix © by 

(|a£ m | 2 )(n) = o>(<5 m /| m | + e(l - <W|m|)) (7) 

where n, e and m'{£) are the free parameters of the model, 
with e ^ 1. We find that if we analyze each £ separately we 
rediscover the instabilities reported in Section [5] In Table 
we take TOH1 for definiteness, and present the winning m! , 
its associated (b, I) and Hf, and also the runner up in cases 
where we get close calls in maximising Hf. We see that the 
Bayesian analysis, in this set up, merely confirms the £ = 2, 
m' — 0, 2 and the £ — 4, m' — 0, 2 instabilities. 

However, a totally new perspective into these instabili- 
ties now makes itself known. Hf only becomes the real evi- 
dence H after it is degraded by the penalization H p , related 
to the number of parameters of the model. If we allow each 
£ to choose its own parameters then the overall Hf is large 
(the sum total) but the penalization is prohibitive as each 
multipole has 3 parameters. Thus in optimizing H we wish 
to reduce the number of parameters by always seeking a 
common axis n for all £ in ((TJl . This immediately removes 
the instabilities found in the frequentist formalism, by effec- 
tively penalizing for jumping between close calls, when one 
choice leads to a better common set of parameters. 

Take for example £ = 2. We have that m' = 0, 2 are close 
competitors in the optimization of Hf, however only m! — 2 
picks an axis that is roughly aligned with the preferred axis 
for the other multipoles. So only ml — 2 permits a large 
saving in H p (H p = 2 per axis, using, say, the AIC) with 
only small deterioration in Hf. An instability would only 
arise if m' — improved Hf by an extra 2 when compared 
with m' = 2. The penalization forces the multipoles to chose 
common parameters, at the risk of decreasing the fit a little. 
Thus, in order to maximize H — and not only Hf — we should 
select a common n for £ = 2 — 5, and the complete result 
(for the same data-set) is presented in Table [4] 

In order to mimi c the full treatment 
in IMagueiio fc Sorkinl l|2006h we should also seek a 
common e, thus reducing the number of parameters further. 
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rn 1 


(b 





e 


Hf 


2 


2 


49 


-96 


0.052 


2.33 


3 


3 


49 


-96 


0.108 


2.01 


4 





49 


-96 


0.058 


3.21 


5 


3 


19 


-96 


0.028 


7.34 


2-5 




49 


-96 




14.89 



Table 4. The maximum likelihood im- 
provement, Hf, for a dominating m-mode 
in the TOH1 map, where each multipolc 
can select its own e and m! , but a common 
favoured axis is found. 



Data-set 


(b 





€ 


m's 


Hf 


h aic 


H BIC 


InB 


lnB 23 


LILC1 


48 


-100 


.077 


2303 


11.46 


2.13 


-0.67 


-1.43 


-0.17 


TOH1 


49 


-96 


.051 


2303 


14.54 


5.21 


2.41 


0.80 


0.11 


TOH3 


48 


-97 


.073 


2303 


11.57 


2.24 


-0.56 


-1.15 


-0.21 


WMAP3 


48 


-100 


.072 


2303 


12.10 


2.77 


-0.03 


-1.01 


-0.18 



Table 5. The maximum likelihood improvement, Hf, for the m-preference model 
with a common axis and common e between the four multipoles I = 2 — 5, and the 
variable m' , for various data-sets (i.e., 7 extra parameters). We consider the evidence 
H using AIC, BIC methods, as well as the Bayes factor (mi?). We also compute the 
Bayes factor for just £ = 2,3. 



This can be done via the method of Lagrange multipliers, 
i.e., by maximizing 



H 



Tf 



~ 2^ 9 



„2 

—=- + In a a 



Ar 2 2 2 2 i 

1 [021 032 - 031^22] 



- A 2 [cr| 1 cr^2 - 041032] - X 3 [a 41 (Ts2 - 051042] (8) 

where i = 1, 2 indexes the sub-samples for the m- modes with 
the large and small variance respectively, with Nu modes 
and sample variance as a- The solutions for the variance an 
are constrained such that a^/aei = e, to fit with our model 
(0. This has solution 



2 

0>i 



„2 



1 



2a ei 
N li 



with a 24 = ±A, a Zi = ±(—A + B), ctu = ±(--B + C) and 
Q5i = =pC, where A,B,C are solutions of the 3 quadratic 
equations expressing £2 = £3 = £4 = £5- 

The results are presented in Tabled] For all of the data- 
sets the choice m's are the same (as opposed to the frequen- 
tist statistics) , and the preferred common axis is remarkably 
robust. The common parameter e and Hf are also reason- 
ably stable. Thus as far as choice of statistics V's available 
data-sets are concerned we have found an improved formal- 
ism and a robust set of best-fitting parameter values. 

To compute the Bayes factor we use the same priors 
as before, with uniform priors on the additional {m'} pa- 
rameters, and we record ln(B) in Table [5] The AIC and 
BIC introduce penalizations of 9.33 and 12.13 respectively. 
Regrettably at this point we see that the options for pe- 
nalization spoil the party, with the Bayes factor and BIC 
finding no evidence for the m-preference model (except for 
TOH1), while the AIC favors the m-preference model over 
the base model, and the planarity model (except for TOH3). 
We should perhaps not be overly disheartened by all this 
discord. It is far from peculiar to the AOE effect: see for ex- 
ample the rather disparate conclusions regarding e vidence 
agains t scale-invariance (ns = 1) as reported in iLiddlei 
< |2007h . 

We note that the BIC gives us a simple tool to ex- 
amine the effect of priors. If, for example, the model has a 
built in positive mirror parity ( do Olivcira- Costa et al.ll 199(3 ; 
IStarobinskv||l993l ; lLand fc Magueiidl2005cT ). the number of 
possible m' values is reduced, leading to a lower penaliza- 
tion (10.12) for the same Hf (only mirror positive modes 
are found in the data). This improvement of 2 in the H BIC 
values will push the BIC (and probably the ln(B)) result to 



favor this particular "positive reflection parity" model over 
the base mode. But such a prior should be physically moti- 
vated. 



3.3 Simulations 

To assess (in a frequentist way) the significance of the max- 
imum likelihood values, Hf, in Tables [2] and [5] we compare 
our results to those from simulations. We stress that this is 
an alternative to the Bayesian method, for which the evi- 
dence is completely summarised by the Bayes factor, ln(B), 
with significance determined by the Jeffreys' scale . The fre- 
quentist approach to model selection in this case involves 
simulating data for the base model (Gaussian statistically 
isotropic (SI) CMB) and computing our Hf "goodness of 
fit" statistic for the proposed models (Eqns © and @). 
We then obtain frequency plots for Hf which indicate how 
well one would expect the proposed models to fit Gaussian 
SI CMB data. If the WMAP data finds a significantly better 
fit then we can conclude that the data is unlikely to be from 
a Gaussian SI model, at some confidence level (CL). 

We use 10,000 Gaussian SI simulations, with the latest 
WMAP best fit ACDM power spectrum, to find the distri- 
bution of Hf for the planarity model and the m-preference 
model. We plot histograms of the results in Fig. [2] This ap- 
proach provides us with an alternative measure of the sig- 
nificance of our Hf values, and in Fig.[5]we list the percent- 
age of the simulations that find a higher Hf value. We see 
that the planarity model consistently finds significance at 
the 98% level. The m-preference model generally has lower 
significance, at the 94-96% level, except for the TOH1 map 
which finds very strong evidence for the m-preference model, 
at > 99% level. Note that it is this map that finds the m- 
preference AOE with the original statistic (see Table [TJ. 

These results are in agreement with the Bayesian ap- 
proach (ln(B) and BIC), as the planarity model is favoured 
over the more general m-preference model except for TOH1. 
However, the Bayesian approach generally finds lower evi- 
dence for these models compared to the base model, and 
it actually finds no evidence for the m-preference model 
(except for TOH3). This reflects the well known fact that 
the Bayesian approach to model selection tends t o set a 
higher threshold than fr e quent is t approaches (e.g ., iTrottal 
l|2005l) ; iMukheriee et~aH < |2006h : iLinder fc Miguel |2007l) ). 
Which result is more "correct" is a matter of personal opin- 
ion, however the more conservative Bayesian approach is 
often preferred in the field of cosmological model selection. 
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Data-set H? % HT % 



LILC1 6.51 2.69 11.46 6.90 
TOH1 7.48 1.37 14.54 0.53 



TOH3 6.97 2.02 11.57 6.35 
WMAP3 6.49 2.71 12.10 4.21 



2 4 6 8 10 6 8 10 12 14 

H, H, 



Figure 2. The distribution of H f returned by 10,000 Gaussian and isotropic simulations for the planarity model (left) and the general m- 
preference model (middle). We also plot the result obtained by the WMAP3 map (short dashed line). In the Table we list the percentage 
of simulations that find higher HJ values for the planarity model (P) and the m-preference model (m). We stress that this approach 
does not take account of the relative complexities of the models. 



A disadvantage of the Bayesian approach is its sensitiv- 
ity to priors, and its insensitivity to useless parameters that 
are unconstrained by the data. However, the frequentist ap- 
proach can involve a large amount of computational time 
and can be prone to selection effects (i.e., using a statis- 
tic pre-tuned by the data). Consider that we could always 
choose some convoluted complex statistic for which our data 
returns anomalously high (or low) values, compared to the 
simulations. Only the Bayesian approach can help here in 
imposing a suitable penalization, by averaging the likelihood 
over the extra parameter space. This ensures that a model is 
preferred only if the improvement in the fit merits opening 
up this extra dimension of parameter space. 

The IC method provides another way of penalizing for 
the extra parameters, however we see that the AIC generally 
prefers the m-preference model (with the most parameters) 
to the planarity or base model - in disagreement with both 
the Bayesian and frequentist approach. 



4 CONCLUSIONS 

We have highlighted weaknesses with the original AOE 
statistic ^ that probed m-preference for £ = 2 — 5. These 
are primarily: 1) lack of robustness: small changes in the 
data produce very different best-fitting parameter values, 
i.e., the statistics are discontinuous; 2) variations with data- 
set: it is hard to connect varying results to imperfections in 
the data or the statistic; 3) the need for simulations to as- 
sess significance: no way of penalizing for extra parameters 
or comparing competing theories on an equal footing, e.g., 
planarity V's general m-preference. 

We have found an improved formalism by employing a 
model selection approach, which cures the instabilities by 
favouring common parameters between the multipoles. The 
original instabilities were due to the existence of multiple 
solutions for a given multipole. But bringing in a penal- 
ization related to the number of parameters of the model 
enforces "Occams Razor" and selects solutions where pa- 
rameters are common between the multipoles. We now find 
the best-fitting parameter values are robust. 

The model selection approach also allows assessment 
of the relative Bayesian evidence (ln_B) for the planarity 
model (correlation between £ = 2, 3, m' = £ modes) and 
the m-preference model (a correlation b etween £ = 2 — 5, m! 
not restricted). This extends the work of lMagueiio fc Sorkinl 



(2006) where the \ow-£ low-power evidence was assessed, as 
well as planarity for some data-sets. 

Using the Bayes factor, and the BIC approximation, 
we find that there is substantial evidence for the planarity 
model, but no evidence for the m-preference model. We also 
take a frequentist approach to the problem, and compare 
the "goodness of fit" (Hf) to those from Gaussian SI sim- 
ulations. In agreement with the Bayesian approach, we find 
stronger evidence for the planarity model (~ 98% CL), than 
for the m-preference model (~ 95% CL). These results are in 
contradiction with the AIC approach which finds evidence 
for both models, and generally stronger evidence for the m- 
preference model. We think this demonstrates a weakness of 
this crude statistic, that does not appear to penalize enough 
for extra parameters. 

The m-preference model is a more general version of 
the planarity model. It is therefore not surprising that the 
evidence for the planarity model is higher, as the parameter 
space is smaller while still including the best fitting model 
(m' = £). Likewise, we could restrict the m' parameters to 
positive mirror parity modes and find a higher Bayes factor. 
But without a theoretical motivation for restricting the m' 
parameters to these values it could be argued that this ap- 
proach involves tuning our model (or equivalently - the pri- 
ors) to fit the data. Therefore, the lower significance (~ 95%) 
result for the m-preference model is our more conservative 
result for the significance of the AOE in the WMAP third- 
year data. Note that the Bayes factor finds no support for 
this model, in multipoles i — 2 — 5, nor for just £ — 2,3 (see 
last column of Table [5]) . 

The higher significance returned by the simulations, 
compared to the Bayes factor, highlights an important dif- 
ference between the Bayesian and frequentist approaches 
to model comparison. For some confidence level, the ln(B) 
threshold and frequentist Hf threshold can disagree, with 
the Bayesian approach tending to be the more conservative 
- a phenomenon not unheard of when discussing "2-sigma" 
results. 
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